143 research outputs found

    The Expected Sample Allele Frequencies from Populations of Changing Size via Orthogonal Polynomials

    Full text link
    In this article, discrete and stochastic changes in (effective) population size are incorporated into the spectral representation of a biallelic diffusion process for drift and small mutation rates. A forward algorithm inspired by Hidden-Markov-Model (HMM) literature is used to compute exact sample allele frequency spectra for three demographic scenarios: single changes in (effective) population size, boom-bust dynamics, and stochastic fluctuations in (effective) population size. An approach for fully agnostic demographic inference from these sample allele spectra is explored, and sufficient statistics for step-wise changes in population size are found. Further, convergence behaviours of the polymorphic sample spectra for population size changes on different time scales are examined and discussed within the context of inference of the effective population size. Joint visual assessment of the sample spectra and the temporal coefficients of the spectral decomposition of the forward diffusion process is found to be important in determining departure from equilibrium. Stochastic changes in (effective) population size are shown to shape sample spectra particularly strongly

    Maximum likelihood (ML) estimators for scaled mutation parameters with a strand symmetric mutation model in equilibrium

    Get PDF
    With the multiallelic parent-independent mutation-drift model, the equilibrium proportions of alleles are known to be Dirichlet distributed. A special case is the biallelic model, in which the proportions are beta distributed. A sample taken from these models is then Dirichlet-multinomially or beta-binomially distributed, respectively. Maximum likelihood (ML) estimators for the mutation parameters of the biallelic parent-independent mutation model are available via an expectation maximization algorithm. Assuming small scaled mutation rates, the distribution of a sample of size MM can be expanded in a Taylor series of first order. Then the ML estimators for the two parameters in the biallelic model can be expressed using the site frequency spectrum. In this article, we go beyond parent-independent mutation and analyse a strand-symmetric mutation model with six scaled mutation parameters that deviates from parent independent mutation and, generally, from detailed balance. We derive ML estimators for these six parameters assuming mutation-drift equilibrium and small scaled mutation rates. This is the first time that ML estimators are provided for a mutation model more complex than parent-independent mutation

    Maximum likelihood estimators for scaled mutation rates in an equilibrium mutation-drift model

    Full text link
    The stationary sampling distribution of a neutral decoupled Moran or Wright-Fisher diffusion with neutral mutations is known to first order for a general rate matrix with small but otherwise unconstrained mutation rates. Using this distribution as a starting point we derive results for maximum likelihood estimates of scaled mutation rates from site frequency data under three model assumptions: a twelve-parameter general rate matrix, a nine-parameter reversible rate matrix, and a six-parameter strand-symmetric rate matrix. The site frequency spectrum is assumed to be sampled from a fixed size population in equilibrium, and to consist of allele frequency data at a large number of unlinked sites evolving with a common mutation rate matrix without selective bias. We correct an error in a previous treatment of the same problem (Burden and Tang, 2017) affecting the estimators for the general and strand-symmetric rate matrices. The method is applied to a biological dataset consisting of a site frequency spectrum extracted from short autosomal introns in a sample of Drosophila melanogaster individuals.Comment: 39 pages, 4 figures, simulation to test accuracy of the model adde

    Nuclear and plastid haplotypes suggest rapid diploid and polyploid speciation in the N Hemisphere Achillea millefolium complex (Asteraceae)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Species complexes or aggregates consist of a set of closely related species often of different ploidy levels, whose relationships are difficult to reconstruct. The N Hemisphere <it>Achillea millefolium </it>aggregate exhibits complex morphological and genetic variation and a broad ecological amplitude. To understand its evolutionary history, we study sequence variation at two nuclear genes and three plastid loci across the natural distribution of this species complex and compare the patterns of such variations to the species tree inferred earlier from AFLP data.</p> <p>Results</p> <p>Among the diploid species of <it>A. millefolium </it>agg., gene trees of the two nuclear loci, ncp<it>GS </it>and <it>SBP</it>, and the combined plastid fragments are incongruent with each other and with the AFLP tree likely due to incomplete lineage sorting or secondary introgression. In spite of the large distributional range, no isolation by distance is found. Furthermore, there is evidence for intragenic recombination in the ncp<it>GS </it>gene. An analysis using a probabilistic model for population demographic history indicates large ancestral effective population sizes and short intervals between speciation events. Such a scenario explains the incongruence of the gene trees and species tree we observe. The relationships are particularly complex in the polyploid members of <it>A. millefolium </it>agg.</p> <p>Conclusions</p> <p>The present study indicates that the diploid members of <it>A. millefolium </it>agg. share a large part of their molecular genetic variation. The findings of little lineage sorting and lack of isolation by distance is likely due to short intervals between speciation events and close proximity of ancestral populations. While previous AFLP data provide species trees congruent with earlier morphological classification and phylogeographic considerations, the present sequence data are not suited to recover the relationships of diploid species in <it>A. millefolium </it>agg. For the polyploid taxa many hybrid links and introgression from the diploids are suggested.</p

    Allopolyploid speciation and ongoing backcrossing between diploid progenitor and tetraploid progeny lineages in the Achillea millefolium species complex: analyses of single-copy nuclear genes and genomic AFLP

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the flowering plants, many polyploid species complexes display evolutionary radiation. This could be facilitated by gene flow between otherwise separate evolutionary lineages in contact zones. <it>Achillea collina </it>is a widespread tetraploid species within the <it>Achillea millefolium </it>polyploid complex (Asteraceae-Anthemideae). It is morphologically intermediate between the relic diploids, <it>A. setacea</it>-2x in xeric and <it>A. asplenifolia</it>-2x in humid habitats, and often grows in close contact with either of them. By analyzing DNA sequences of two single-copy nuclear genes and the genomic AFLP data, we assess the allopolyploid origin of <it>A. collina</it>-4x from ancestors corresponding to <it>A. setacea</it>-2x and <it>A. asplenifolia</it>-2x, and the ongoing backcross introgression between these diploid progenitor and tetraploid progeny lineages.</p> <p>Results</p> <p>In both the ncp<it>GS </it>and the <it>PgiC </it>gene tree, haplotype sequences of the diploid <it>A. setacea</it>-2x and <it>A. asplenifolia</it>-2x group into two clades corresponding to the two species, though lineage sorting seems incomplete for the <it>PgiC </it>gene. In contrast, <it>A. collina</it>-4x and its suspected backcross plants show homeologous gene copies: sequences from the same tetraploid individual plant are placed in both diploid clades. Semi-congruent splits of an AFLP Neighbor Net link not only <it>A. collina</it>-4x to both diploid species, but some 4x individuals in a polymorphic population with mixed ploidy levels to <it>A. setacea</it>-2x on one hand and to <it>A. collina</it>-4x on the other, indicating allopolyploid speciation as well as hybridization across ploidal levels.</p> <p>Conclusions</p> <p>The findings of this study clearly demonstrate the hybrid origin of <it>Achillea collina</it>-4x, the ongoing backcrossing between the diploid progenitor and their tetraploid progeny lineages. Such repeated hybridizations are likely the cause of the great genetic and phenotypic variation and ecological differentiation of the polyploid taxa in <it>Achillea millefolium </it>agg.</p

    Inference in population genetics using forward and backward, discrete and continuous time processes

    Get PDF
    DS and CK were partially funded by FWF-P24551-B25. CK has been partially funded by the Vienna Science and Technology Fund (WWTF) through project MA16-061.A central aim of population genetics is the inference of the evolutionary history of a population. To this end, the underlying process can be represented by a model of the evolution of allele frequencies parametrized by e.g., the population size, mutation rates and selection coefficients. A large class of models use forward-in-time models, such as the discrete Wright-Fisher and Moran models and the continuous forward diffusion, to obtain distributions of population allele frequencies, conditional on an ancestral initial allele frequency distribution. Backward-in-time diffusion processes have been rarely used in the context of parameter inference. Here, we demonstrate how forward and backward diffusion processes can be combined to efficiently calculate the exact joint probability distribution of sample and population allele frequencies at all times in the past, for both discrete and continuous population genetics models. This procedure is analogous to the forward-backward algorithm of hidden Markov models. While the efficiency of discrete models is limited by the population size, for continuous models it suffices to expand the transition density in orthogonal polynomials of the order of the sample size to infer marginal likelihoods of population genetic parameters. Additionally, conditional allele trajectories and marginal likelihoods of samples from single populations or from multiple populations that split in the past can be obtained. The described approaches allow for efficient maximum likelihood inference of population genetic parameters in a wide variety of demographic scenarios.PostprintPeer reviewe
    • …
    corecore